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1 Introduction 


Obstacle detection is an area of considerable interest in such applications as rotorcraft nap- 
of-the-earth navigation and spacecraft landing. There are mainly two possible approaches 
to the associated vision problem, one based on the optical flow resulting from the motion 
of a single image sensor, the other based on stationary stereo. Both approaches have been 
taken in previous works on passive ranging (e.g., [1, 2, 3, 4]). In this paper we take the single 
sensor, or monocular vision approach. 

The motion of an imaging sensor causes each point of the scene to describe a time 
trajectory in the image plane. The trajectories of all image points constitute the optical 
flow. An imaging sensor, such as a TV camera or a Forward-Looking Infra-Red (FLIR) is 
typically used as the source of optical flow data. Similarly to other passive-ranging methods, 
we assume that the scene and its illumination sources are temporally stationary (see [5]). 

The optical flow at any given point in the image plane may consist of three kinds 
ot motion: lateral translation, expansion (or divergence), and rotation (or curl). When 
considering the vicinity of any given point, its time evolution can be described approximately 
by an affine transformation which, in the most general case, is defined by six parameters- 
four belonging to the 2 x 2 multiplying matrix, and two belonging to the vector of lateral 
translation. Most depth-estimation methods, such as those described in [6, 7], make use of 
the lateral motion alone. Two basic limitations are implicit in these methods. They perform 
poorly in the image plane close to the focus of expansion, and they can only use a relatively 

short distance between frames. As shown in earlier work [8], the depth estimation error is 
inversely proportional to this distance. 


The idea of using divergence (or expansion) as a source of depth information is not 
new. The works of Longuet-Higgins and Prazdny [9], Prazdny [10, 11], Koenderink [12] 
Roendennk and van Doom [13, 14], and Nelson and Aloimonos [15] elaborate extensively 
on t is subject. An extension of the local divergence approach was recently reported in [16], 
where a diffusion process is used to roughly delineate objects and “paint” them according 

to t e imminence of collision. This technique is especially useful for objects that have well- 
denned contours. 


In this work we do not try to estimate the numerical depth value, but, rather, we try 
to classify an obstacle as being “safe” or “dangerous”, using a pattern-recognition approach. 
We iely on the objects being textured, and extract information from local expansion as 
illustrated by Figure 1. We look for patterns associated with high or low values of the 
local divergence. Since the latter is calculable from the image-plane temporal and spatial 
envatives, we use those as the components of the pattern vector. Reducing these compo- 
nents to their sign function, we obtain binary pattern vectors which prove to be sufficient 
for our obstacle detection purposes. We employ two recently developed techniques, reported 
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in [17] and [18], for classifying the pattern vectors as representing safe or dangerous situa- 
tions. These techniques are based on expanding the patterns into high dimensional space for 
high storage capacity, and applying learning techniques, previously developed in the neural 
networks literature, to the resulting sparse representations. The first technique, employing 
Rosenblatt s learning rule, offers relatively accurate results at the cost of relatively slow 
computation, while the other, employing the so-called Hebbian learning rule, which is less 
accurate, is considerably faster. 


2 The divergence equations 


In this section we present the mathematical background necessary to understand the partic- 
ular choice of variables for the pattern vectors used by our recognition method. 

Denoting by x and y the cartesian coordinates of points in the image plane, the diver- 
gence of the image-plane velocity vector (also referred to as the optical flow), v = [v x u^] 7 . 
at some given point [a- y ] is known to satisfy [16] 


Vv = 


dvx_ 

dx 


dv y 

+ ^ =2lT 


(i) 


where V [ 9 9 x and r is the time to collision. The velocity divergence, then, provides 
what we are looking for. 


Let us now examine what measurements and relationships are available for calculating 
the divergence from its definition. The brightness constancy equation ([5]) is 

dl da: dl d?/ dl 

dx dt + dy dt ~ dt ’ ^ 

where / denotes brightness, or gray level, and t denotes time. In vector form 


v T VI = - 


dl 

dt 


( 3 ) 
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where VI [g r g y • Equation (3) does not have a unique solution for the velocity vector; 
it is satisfied by any vector having a component V/ parallel to VI. This is the “aperture 
problem”. Solving for this component, we have 


V; = 


where | • | denotes the length of a vector. 


i vi r A 


( 4 ) 


Since the direction of the local edge is generally unrelated to the local velocity vector, we 
cannot get the latter from (4). One possible way to circumvent both the aperture problem 
and the random nature of the edges is to apply (4) at least twice to close-by local edges, 
assuming that they experience the same v but have different orientations. Rewriting (4) 
twice for two such edges yields two equations with two unknowns — the two components 
of v itself. Taking one step further, the same equation can be applied many (say n) times 
and the two velocity components for some image point can be obtained from a least-squares 
solution of this equation set, which may be written in matrix form as 


where 



The least-squares solution of (5) is 


A\v x v y ] T = B 


(«),' 


(If), 1 

(«), 


(If), 


and B = — 


(«).. 


. (*). . 


( 5 ) 


( 6 ) 


[v x v y ] T = (A T A) 1 AB ( 7 ) 

Equation (7) has a unique solution if and only if the rank of A is 2, which means that at 
least two edges are oriented differently. The practical problems with such an approach are 
that it depends on the successful association of all the n points with a single object, and 
that the derivatives used in the solution are generally noisy. To suppress noise, one would 
like to use a large number of equations, but this entails identifying a large image area as 
belonging to the same object through some form of image segmentation. Since methods of 
image segmentation have their own pitfalls, a practical approach would be to assume that 
all points within a window of a certain size experience the same v. 


In this work we suggest an approach which, rather than solving equations, regards the 
relevant information as a pattern to be associated with either small or large divergence 
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values. The question is, what kind of information should constitute a pattern. Suppose that 
v coincides with v 7 . Applying (1) to (4), we have 


d / dl/dt dl\ d_fdl/dtdl\ 

dx V| VI\*dx) dy\\VI\idy) ( 8 ) 


which indicates that the relevant information for calculating divergence is captured by the 
first and second spatial and temporal derivatives of the image light intensity (represented bv 

1 1 \ . I . » i iZT ar CkT r nO r ^2 J J ^ J ^ 

dxdl' d^Tf Notice that these derivatives are 


gray levels), that is by f^, ^4 

’ J dx ’ dy » dt > dx 2 ’ dy? > dxdy 


required at the image point under consideration. Alternatively, it can be seen that only first 
spatial and temporal derivatives of I are required in the least squares solution represented 

by (7). However, these derivatives are required at several points in the neighborhood of the 
image point. 


Guided by the latter observation, we chose to use the three first derivatives at n = 13 
points of a grid, centered at the desired image point, as the components of the pattern vector 
on which classification is to be performed. Using noisy derivatives and real-valued pattern 
vectors, which do not lend themselves easily to classification, presents a potential difficulty. 
In this work we show that, for classification purposes, the real-valued components of the 
derivative vectors can be reduced to binary ones, representing the signs of those derivatives. 
The input vector to the classification algorithm will consist, then, of 39 binary components. 

As we have shown in this section, the direct solution of the divergence equations is quite 
involved. Our experience shows that a numerical solution of equations (1) and (7) does not 
yield satisfactory results. It is therefore remarkable that the neural classifiers, described 

next, were able to produce correct detection results with a high rate of success, using the 
same information. 


3 The pattern recognition algorithms 


We employ two recently proposed classifiers, each offering a certain advantage over the other. 
The first, reported in [17], is based on transforming the input data into sparse binary internal 
representations, and learning the class assignment of the latter by applying Rosenblatt’s rule. 
The second, reported in [18], is also based on sparse binary internal representation of the 
input data, but the learning rule is Hebbian. The first method, employing discriminatory 
storage of only those input vectors which are classified incorrectly, may require many training 
epochs. The second, employing indiscriminatory storage of the training input-output pairs, 
requires a single passage over the training set. Consequently, the second method is usually 
considerably faster than the first, which will, however, produce more accurate results, given 
a sufficiently long processing time. These relative advantages of the two techniques apply to 
both their hardware and software implementations. 
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Figure 2: Classifier structure. 


Both methods have the structure depicted in Fig. 2. A given 39-dimensional input 
vector u , whose components are +1 or -1, is transformed into a sparse TV-dimensional 
binary vector x, whose components are 0 or 1, by a layer of TV internal cells. The size TV of 
the internal layer is to be determined. Each internal cell performs the function 

f 1 if u) > t 

* i = {o if („">,„)<< < 9 > 

where «<') G {±l} n is a vector with randomly chosed binary components, (u<‘),u) denotes 
the inner- product between the two vectors and t is a positive scalar. The resulting vector x 
is the internal representation of the input vector u. The input weights of the internal cells 
define the centers of spheres of Hamming radius r = 0.5(n - t) in {±l} n , so that the cell’s 
output is 1 when the input falls within the sphere and 0 otherwise. (It should be noted that 
the activation radius r is selected first as an integer, then t is obtained as t = n — 2r). 


The output function, representing the class assignment of u, is 


f 1 if (w, x) > 0 
\ 0 if ( tu,x ) < 0 


(10) 


where Wj, the j th element of w, is the weight of the connection between the j’th internal 
cell and the output, to be determined by the learning process. 


The centers of the spheres represented by the internal cells are chosen randomly. The 
spheres should cover the input space with high probability. The minimal number of spheres 
that cover the space {±l} n with probability of, at least, 1 - 2“ tn , was derived in [19] as 


TV > 


(1 + e)n2" In 2 
Sn(r) 


= (1 + e )n2 ntl_/l2( ' >)]+ °( logn )ln2 


( 11 ) 
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where 


Sn(r) = ±( n 

i=i V 1 

is the volume of a sphere of radius r in {±l} n , p = n/r, h 2 {p) = -p\og 2 (p)-(l-p)log 2 (l-p), 
and O(logn) < clogn for some positive scalar c. 

The size of the minimal covering code for n = 39 and for covering probability 0.99 is 
plotted in Fig. 3 against the relative activation radius p, in terms of log N. It can be seen 
that, for p = 0.3, which was found to be adequate from discretization considerations [18], 
we have logA r = 0.3755, hence, N = 5600. These are the values that will be used by the 
proposed classifiers. 


_ 2 n ^M+o(\ogn) 


( 12 ) 


log N 



Figure 3: Minimal covering code size for input dimension n = 39. 


Rosenblatt learning 

Given a training set A of pairs id*), yb), i = 1, . . . , A/, where td*) is an input vector and 
t/(d i s the correct output for classification is learnt by applying Rosenblatt’s perceptron 
learning rule to the corresponding pairs of internal representations and outputs: Start with 
w j = 0i j = 1, • • • , N. Pick a vector u ^ from A and present it as an input. If the output 
y agrees with y('\ leave w unchanged and proceed to the next training input. If y = — 1 
and yb) = add the internal representation to the weights vector. If y = 1 and yb) = — 
subtract the internal representation from the weights vector. This learning rule may be 
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written as 


{w ]+X] yM if y^y« 

l w j if y = y {,) ^ 

The perceptron convergence theorem [20] implies that this learning process will converge 
to a final value of the weights vector w, so that the internal representations of the training 
vectors are classified correctly, provided that their assignment is not ambiguous and that such 
a vector exists. The existence of such a vector is guaranteed if the internal representations 
are linearly independent. It may, however, take many iterations for the algorithm to converge 
to final values, which stands in contrast to the storage method described next. 

Hebbian storage 

The second learning method calculates the connection weights between the internal cells 
and the output by the Hebbian rule: 

M 

vj = T, x ?y {i) j = h-..,N (i 4) 

t=i 

This is an indiscriminatory correlative storage. Our input-space covering requirement guar- 
antees that, given sufficiently many training pairs, the entire input space will be represented 
by the external weights, so as to associate every input vector with a class. It is shown in [18] 
that the learning capacity of this classifier is 2 TV, and that it has a substantial generalization 
capability. It takes a single passage over the training set (a single training epoch) and is, 
consequently, considerably faster than Rosenblatt’s learning. On the other hand, the effec- 
tive discretization of the input space generated by this method is coarser, and, consequently, 
it will often produce less accurate result than the one based on Rosenblatt learning. 


4 Simulation 


4*1 Creating the pattern vectors 

A scenario simulation has been developed to generate the required samples. It consists 
of a helicopter flying along a pre-chosen trajectory, including possible maneuvers (that is, 
pitch, yaw and roll components of motion), towards a vertical wall normal to the line of 
sight. The wall has a textured surface, generated by filtering white Gaussian noise through 
a Gaussian-shaped point-spread-function having a 2-pixel spatial correlation width. The 
simulation program allows us to control all the parameters of interest, such as the texture 
fineness, the dynamic range of the gray-levels and the distance from the focus of expansion. 
Figure 4 shows frames 12 and 24 of the simulated textured wall as it is seen in forward flight. 
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Figure 4: Frames 12 and 24 of simulated textured wall as seen in forward flight 
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We scan this wall in order to pick up samples for training the classifier. A single sample 
vector is composed of the first spatial and temporal derivatives, J|, |^, for 13 grid points 
centered at a given sample point. The grid arrangement is shown in Figure 5. We note that, 
since the first spatial derivatives are given for each of the grid points, information about the 
second spatial derivatives at the center point is also implicit in the data. Next, we replace 
the real values of the 39 vector components by their signs, representing a negative sign by 
— 1 and a positive sign by 1. To each sample vector we attribute an output value of —1 if it 
corresponds to a safe distance and +1 if it corresponds to an unsafe one. 

• • • 


• • • 


Figure 5: Grid points at which first derivatives are taken to construct a pattern vector for 
the center point 

The simulated flights, at a speed of 0.5 m/f (meter /frame), start 150 m from the textured 
wall with the camera initially pointing to the wall center. The flights end at a distance of 40 
m from the wall. Maneuvering flights exercise a roll of 0.02 rad/f in addition to pitch and 
yaw motions at rates of 0.0005 rad/f. In one set of experiments, the close (or dangerous) 
zone, was defined as the points whose distances from the wall are between 40 m and 70 m, 
w r hile the far (or safe) zone, as the points at distances beween 110 m and 150 m. In another 
set, points in the range 40 - 80 m were considered close and those between 90 m and 130 m 
were considered far. The second set, having a narrow deadband of only 10 m between the 
two zones, is considerably more challenging than the first, having a 40 m deadband. 

Each simulation generates 220 images of 128 x 128 pixels, separated by a distance 
of 0.5 m normal to the wall. Sampling considerations suggest that image plane points 
having an absolute velocity value greater than 1.5 pixel per frame be eliminated. This 
effectively confines the sample points to a circle centered at the focus of expansion whose 
radius decreases with the distance from the wall. 

The nature of the simulation is such that data points corresponding to the far zone are 
created before those corresponding to the close one. Since the first classification method 
is sensitive to the ordering of the data, we apply a mixing algorithm that rearranges the 
data in a random order. The last step in preparing the data is dividing it into two exclusive 
sets: the first, consisting of about 70 percent of the sample points, is used for training the 
classifiers, w r hile the other is used for testing their performance on new unlearnt data. 

We applied the two classification algorithms described in the previous section to each 
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of the simulated cases. Allowing 10 training epochs for the Rosenblatt learning algorithm, 
the classifier based on this method was about 10 times slower than the one employing Hebb 
learning, but it produced more accurate results in most cases. We shall refer to the two 
algorithms as the slow one and the fast one, respectively. 


4.2 Experimental results 

Experiment 1 

We first simulated a flight path normal to the wall without maneuvers. The close and the 
far zones are separated by a deadband of 40 m. The simulation created 10460 input-output 
pairs, of which 3157 were close and 7303 were far. After mixing the data, we selected 7500 
pairs for training and 2960 pairs for testing. 

The average error was 0.0929 for the fast algorithm and 0.0475 for the slow one. 

Experiment 2 

Reducing the deadband between the two zones to 10 m in a nonmaneuvering flight, 9985 
samples were created, of which 3824 were close and 6161 were far. The training set consisted 
of 7000 points and the testing set of 2985 points. 

The fast algorithm yielded an average error of 0.1272, and the slow one an average error of 
0.0928. 

Experiment 3 

Simulating a maneuvering flight with a 40 m deadband between the two zones, we produced 
3909 samples, of which 1390 were close and 2519 were far. The training set consisted of 3000 
points and the test set of 909 points. 

The results were 0.1418 for the fast algorithm and 0.0990 for the slow one. 

Experiment 4 

Repeating the maneuvering flight with a 10 m deadband, the total number of samples gen- 
erated was 5738, of which 2132 were in the close zone and 3606 in the far one. The training 
set consisted of 4000 points and the test set of 1738 points. 

The fast classifier yielded an average error of 0.1944 for this case and the slow one an error 
of 0.0990. 

Experiment Number 5 
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The purpose of this experiment was to check the classifier’s response to motion consisting 
of stricly lateral maneuvers. Such maneuvers present no danger of collision, and should be 
classified as safe. Training was performed on the dataset of experiment 3. The test data 
consisted of points, all representing a lateral maneuver, at a distance of 75 m from the wall. 

The fast algorithm produced an average classification error of 0.0435 while for the slow one 
the error was 0.0570. This shows that the classification methods are reliable in the sense 
that they do not produce a substantial false alarm, and that both classifiers perform well in 
non-ambiguous situations. 

Changing the grid 

Reducing the horizontal and the vertical distance between grid points from 3 to 2 pixels has 
caused the fast method to produce less accurate results. Increasing this distance to 4 pixels 
reduced the error, but increasing it to 5 increased it again. (For instance, the average error 
produced by the fast method for experiment 4 increased from 0.1944 to 0.2382 when the 
distance was reduced from 3 to 2 pixels. The average error was 0.1655 for a distance of 4 
pixels and 0.1740 for a distance of 5 pixels). However, the performance of the slow method, 
which is more accurate, did not follow the same trend (the average error for a grid distance 
of 4 pixels was 0.1584, higher than the value of 0.0990 obtained for a distance of 3 pixels). 
We conclude that the optimal distance between grid points was 3 for our experiment. 


5 Conclusion 

We have presented a method for obstacle detection from optical flow information employing 
two recently proposed classifiers. It is based on the local divergence of the optical flow in 
the image plane, which is inversely proportional to the time to collision. The input to the 
classifiers consists of binary pattern vectors, whose components are the signs of the first, 
spatial and temporal derivatives of the light intensity at several points in the neighborhood 
of a. given image point. The classifiers, detailed in [17] and [18], employ high-dimensional 
sparse binary internal representations, and either Rosenblatt or Hebbian learning of the 
class assignment. Performance was tested by several scenario simulations, including direct 
approach, maneuvering approach and strictly lateral motion. The rate of success for each of 
these cases was over 90 percent. 
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